In [58]:
fileR <- dir("../script", ".R$")
for (ii in fileR)
    source(paste0("../script/", ii))
In [59]:
data <- read.table("../data/data_mut_DD.csv", header = TRUE, as.is = TRUE)
head(data)
Out[59]:
Genemut_lofmut_missensemut_damagingdn_damaging_DDdn_lof_DDdn_missense_DDdn_silent_DD
1A1BG9.996657e-072.300224e-054.54062e-070000
2A1BG-AS11.420491e-07000000
3A1CF2.144318e-061.68827e-053.35199e-060000
4A2M3.981797e-064.039597e-057.82154e-060000
5A2M-AS19.14698e-08000000
6A2ML13.779929e-064.007178e-052.85364e-060010
In [60]:
t1 <- as.numeric(data$mut_lof)
t1 <- t1[t1 > 0]
data[data$mut_lof == 0, ]$mut_lof <- min(t1)/10
#data <- data[data$mut_missense >0, ]
t2 <- as.numeric(data$mut_missense)
t2 <- t2[t2 > 0]
data[data$mut_missense == 0, ]$mut_missense <- min(t2)/10
dim(data)

#data <- data[data$mut_missense >0, ]
t2 <- as.numeric(data$mut_damaging)
t2 <- t2[t2 > 0]
data[data$mut_damaging == 0, ]$mut_damaging <- min(t2)/10
Out[60]:
  1. 19358
  2. 8
In [61]:
allDNData <- data[, paste0("dn_", c("damaging", "lof"), "_DD")]
allMutData <- data[,paste0("mut_", c("damaging", "lof"))]
head(data.frame(allMutData, allDNData))
Out[61]:
mut_damagingmut_lofdn_damaging_DDdn_lof_DD
14.54062e-079.996657e-0700
21.04146e-101.420491e-0700
33.35199e-062.144318e-0600
47.82154e-063.981797e-0600
51.04146e-109.14698e-0800
62.85364e-063.779929e-0600
In [62]:
ntrioDD = 4293
N <- list(dn = ntrioDD)
modelDNdata <- list(NN = dim(allDNData)[1], #Gene numbers                                                                                             
					K = 2, #Hypothesis numbers: should be default                                                                 
                                        NCdn = 2, #Number of de novo classes                                                                          
                                        Ndn = array(rep(N$dn, 2)), # Family numbers                                                                   
                                        dataDN = data.frame(allDNData), # Denovo data                                                                 
					mutRate = data.frame(allMutData), # Mutation rates                                                            
                                        betaPars = c(6.7771073, -1.7950864, -0.2168248), #Adjust beta's values: should be default                     
                                        adjustHyperBeta = as.integer(1), ##1 if want to adjust beta, 0 if don't want to adjust beta                   
                                        upperPi0 = 0.5, lowerPi0 = 0, lowerBeta = 0, ##Lower and upper limits of pi: should be default                
                                        lowerHyperGamma = 1, lowerGamma = 1, #Should be default                                                       
					hyperBetaDN0 = array(c(5, 1)), 
                   hyper2GammaMeanDN = array(c(1, 1)), ##Priors for mean RRs: meanRR ~ gamma(hyper2GammaMeanDN, hyper2BetaDN)
                   hyper2BetaDN = array(c(0.05, 0.05)) ##Priors for mean RRs: meanRR ~ gamma(hyper2GammaMeanDN, hyper2BetaDN)
                   )
In [63]:
nIteration = 1000 ##T
nChain = 1
nCore = 1
nThin = floor(nIteration/1000)
mcmcDD <- stan(model_code = DNextTADA, ##Use only De novo model inside extTADA
               data = modelDNdata, ##Model data as described above
               iter = nIteration, ## 
               chains = nChain, 
               cores = nCore,
               thin = nThin)
SAMPLING FOR MODEL '493590f0b50b55273a713acc3c2b8395' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)
 Elapsed Time: 40.8516 seconds (Warm-up)
               34.9724 seconds (Sampling)
               75.8239 seconds (Total)

The following numerical problems occured the indicated number of times after warmup on chain 1
                                                                                    count
Exception thrown at line 59: gamma_log: Shape parameter is inf, but must be finite!     1
When a numerical problem occurs, the Hamiltonian proposal gets rejected.
See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
If the number in the 'count' column is small, do not ask about this message on stan-users.
In [64]:
stan_trace(mcmcDD)
In [65]:
mcmcDD
Out[65]:
Inference for Stan model: 493590f0b50b55273a713acc3c2b8395.
1 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=500.

                        mean se_mean    sd     2.5%      25%      50%      75%
pi0                     0.03    0.00  0.00     0.02     0.03     0.03     0.03
hyperGammaMeanDN[1]    23.06    0.29  5.14    14.06    19.47    22.85    26.45
hyperGammaMeanDN[2]    88.58    0.70 11.56    68.08    80.24    88.72    96.02
gammaMeanDN[1]         23.03    0.09  1.47    20.25    21.91    22.96    23.99
gammaMeanDN[2]         92.26    0.41  6.25    80.70    87.98    91.90    96.37
hyperBetaDN[1]          0.83    0.00  0.01     0.81     0.82     0.83     0.83
hyperBetaDN[2]          0.81    0.00  0.00     0.81     0.81     0.81     0.81
lp__                -6795.69    0.09  1.47 -6799.20 -6796.34 -6795.37 -6794.65
                       97.5% n_eff Rhat
pi0                     0.03   247 1.00
hyperGammaMeanDN[1]    34.14   309 1.01
hyperGammaMeanDN[2]   110.22   273 1.00
gammaMeanDN[1]         25.75   285 1.00
gammaMeanDN[2]        105.13   232 1.00
hyperBetaDN[1]          0.85   285 1.01
hyperBetaDN[2]          0.81   284 1.00
lp__                -6793.73   280 1.00

Samples were drawn using NUTS(diag_e) at Tue Dec 13 11:51:51 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
In [66]:
pars0 <- estimatePars(pars = c('pi0',
                               'hyperGammaMeanDN[1]', 'hyperGammaMeanDN[2]',
                               'hyperBetaDN[1]', 'hyperBetaDN[2]'),
                     mcmcResult = mcmcDD)
====
Only pi and hyper parameters are estimated in this step
extTADA does not calculate HPDs for hyper betas, just their medians
===

In [67]:
pars0
Out[67]:
pi00.028212750.023549460.03466213
hyperGammaMeanDN[1]22.7891013.2069134.00535
hyperGammaMeanDN[2] 89.62609 67.42633110.01114
hyperBetaDN[1]0.82516590.82516590.8251659
hyperBetaDN[2]0.80681060.80681060.8068106
In [68]:
par(mfrow = c(1, 2))
plotParHeatmap(pars = c("pi0", "hyperGammaMeanDN[1]"), mcmcResult = mcmcDD)
plotParHeatmap(pars = c("pi0", "hyperGammaMeanDN[2]"), mcmcResult = mcmcDD)
Warning message:
In plot.xy(xy, type, ...): font width unknown for character 0x1Warning message:
In plot.xy(xy, type, ...): font metrics unknown for character 0x1Warning message:
In plot.xy(xy, type, ...): font width unknown for character 0x1Warning message:
In plot.xy(xy, type, ...): font metrics unknown for character 0x1
In [69]:
##Get gene list
geneName <- data[, 1]
##Set parameters: use pars0 above
parsFDR <- list(gammaMeanDN = pars0[, 1][2:3],
                betaDN = pars0[, 1][4:5],
            pi0 = pars0[, 1][1],
            nfamily = rep(ntrioDD, 2))

dataFDR <- calculateFDR(pars = parsFDR,
                       dnData = allDNData, mutData = allMutData,
                       geneName = geneName)
No parameters for case-control data;  therefore, these categories are not calculated in this step.

In [70]:
head(dataFDR)
Out[70]:
geneNamedn_damaging_DDdn_lof_DDmut_damagingmut_lofBFqvalue
681ANKRD110321.12154e-054.918288e-068.849718e+610
1001ARID1B0309.81608e-064.623958e-061.032838e+580
10146MLL1261.19316e-059.084711e-061.019858e+490
347ADNP1192.26581e-062.100249e-062.199936e+380
4832DYRK1A4144.53477e-062.145461e-061.358677e+320
9906MED13L5131.5154e-054.661034e-063.933932e+290
In [71]:
dim(dataFDR[dataFDR$qvalue < 0.1, ])
dim(dataFDR[dataFDR$qvalue < 0.05, ])
Out[71]:
  1. 199
  2. 7
Out[71]:
  1. 163
  2. 7
In [ ]: